AITopics

2607.01492

Country: Europe > France (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Chandramoorthy, Nisha, Sanz-Alonso, Daniel, Waniorek, Nathan

From Spectral Methods to Sample Complexity Bounds for Fourier Neural Operators

arXiv.org Machine LearningJul-2-2026

We establish approximation and learning guarantees for Fourier neural operators (FNOs) applied to time-$T$ solution operators of dissipative evolution equations. The analysis builds on the premise that FNOs can efficiently approximate and learn solution operators whenever these operators admit stable and accurate spectral discretizations. To formalize this idea, we introduce classes of evolution operators defined through spectral methods and derive FNO approximation bounds and polynomial sample complexity guarantees for these classes. For equations with polynomial nonlinearities, the learning rates depend primarily on the smoothness of the input space and the dimension of the physical domain. Our results hold uniformly over broad families of dissipative equations, rather than for a single fixed PDE, and apply in particular to the Navier--Stokes, Allen--Cahn, and Cahn--Hilliard equations. For equations with non-polynomial smooth nonlinearities, we prove that polynomial sample complexity still holds with rates that now additionally depend on the smoothness of the nonlinear terms and the dissipation strength. Overall, we connect classical spectral approximation theory with modern operator learning and explain when FNOs can learn nonlinear evolution operators efficiently.

artificial intelligence, machine learning, operator, (18 more...)

2607.0032

Country: North America > United States > Illinois > Cook County > Chicago (0.40)

Genre: Research Report (0.64)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJul-1-2026

Random Reshuffling Dominates Stochastic Gradient Descent

Liu, Zijian

Stochastic Gradient Descent ($\textsf{SGD}$) is one of the most classical optimization algorithms with favorable theoretical guarantees, yet the practical implementation of $\textsf{SGD}$ differs subtly from its well-known form and is often referred to as Shuffling Stochastic Gradient Descent ($\textsf{Shuffling SGD}$). A particularly popular strategy in $\textsf{Shuffling SGD}$ is Random Reshuffling ($\textsf{RR}$), which has achieved great empirical success across numerous experiments. Despite its strong performance, $\textsf{RR}$ has long been considered a heuristic due to a lack of theoretical support. Over the last decade, people have finally established provable convergence rates for $\textsf{RR}$, thus justifying its observed superiority. However, for smooth convex optimization, two clouds over the convergence theory of $\textsf{RR}$ remain to this day. More precisely, according to the current theory, $\textsf{Shuffling SGD}$ under $\textsf{RR}$ converges only when the stepsize is smaller than a threshold proportional to $1/n$, where $n$ is the number of summands in the objective (or the number of data points). Consequently, the optimally tuned theoretical rate of $\textsf{Shuffling SGD}$ under $\textsf{RR}$ is strictly worse than that of $\textsf{SGD}$ when the number of epochs is smaller than another threshold proportional to $n$. These two restrictions heavily limit the applicability of existing theories and leave a critical mismatch with practice. In this work, for the first time, we prove that $\textsf{RR}$ dominates $\textsf{SGD}$ in smooth convex optimization under any reasonable stepsize after any finite number of epochs, thereby addressing a longstanding open question.

artificial intelligence, machine learning, shuffling sgd, (16 more...)

2606.32005

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Das, Sayan, Yaghooti, Bahram, Kuffner, Todd A., Lahiri, Soumendra N.

On Optimal Data Splitting for Split Conformal Prediction

arXiv.org Machine LearningJul-1-2026

Conformal prediction and its variants, including the split conformal prediction, provide a distribution-free framework for uncertainty quantification by constructing prediction intervals or sets with finite-sample coverage guarantees. The statistical efficiency of these intervals depends critically on how the data are split into training and calibration samples. Despite its practical importance, a principled characterization of the training-calibration split that minimizes prediction interval length while maintaining coverage has remained largely unresolved. In this paper, we develop a theoretical framework for optimal data splitting in split conformal prediction. We first analyze the problem in a general setting and derive analytical characterizations of the length-optimal split ratio under both symmetric and asymmetric regimes. We then show how the general results specialize to several commonly used regression settings, including linear regression, nonparametric regression, and neural networks, thereby demonstrating the scope of the framework. We also describe a data-based method for selecting the optimal proportion. Our analysis clarifies how model-related features govern the optimal allocation of samples between training and calibration and provides principled guidance for constructing shorter prediction intervals. Experiments on both synthetic and real-world datasets demonstrate the applicability of the proposed methodology across a variety of practical scenarios.

machine learning, natural language, prediction, (18 more...)

2606.316

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Dereziński, Michał, Dong, Xiaoyu

How AI settled the complexity of the oldest SGD algorithm

An essential catalyst for the remarkable breakthroughs in AI that led to the modern large language models (LLMs) such as ChatGPT and Gemini has been the algorithms used to train these models on massive datasets. While the LLM architectures have gotten progressively more complex, the training algorithms have stayed relatively simple, and in fact, they have all been based on the decades-old paradigm of stochastic gradient descent (SGD). The key idea behind SGD is that in order to minimize a certain objective function (such as an LLM's error on the training data), it suffices to access only a noisy estimate of that objective at any given time (e.g., based on a small sample of the data) while making incremental progress towards the solution. This is essential for LLM training, as the datasets have become so massive one could not hope to perform computations on everything all at once. Commonly attributed to a 1951 paper by Robbins and Monro [34], SGD has seen a resurgence of interest over the last 20 years by AI researchers and computer scientists striving to understand its effectiveness, leading to numerous variants and extensions used in modern LLMs [12, 9], most notably the Adam algorithm [25]. As a result, we have gained a robust mathematical understanding of the computational complexity of SGD algorithms in a wide range of settings (e.g., see [11, 15, 5, 17]). Yet, despite this progress there is a surprising gap in the understanding of SGD: The complexity of an algorithm proposed by Stefan Kaczmarz in 1937 [24] for solving a system of linear equations - the oldest published example of an SGD algorithm, which predates Robbins and Monro's paper by over a decade - has not been settled.

large language model, machine learning, natural language, (22 more...)

2606.29593

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Liquidity-Based Audit of Algorithmic Trading Strategies

Aldridge, Irene

Market microstructure has long classified trading activity by its informational role: an informed trader demands liquidity by trading in the direction of private information, while a market maker supplies liquidity by absorbing that order flow and earning the spread in compensation Kyle (1985); Glosten and Milgrom (1985). This classification is typically recovered from the data the classifier requires: signed order flow, quote revisions, or the sequential-trade structure of the market. The classification is harder to apply to an algorithmic strategy whose internal logic is unobservable. However, the signals or optimization problems generating the decisions of a typical quantitative fund are not visible, even though the trades and reported positions may be available. This paper shows that the liquidity role of such a strategy (consumer or provider) can be recovered from realized portfolio costs and trade decisions alone, without observing quotes, order flow, or any other microstructure-specific signal.

artificial intelligence, correction, machine learning, (19 more...)

2606.29018

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Huang, Yuqi, Hou, Yunlong, Tan, Vincent Y. F.

Bayesian Best-Arm Identification with Abstention: A Polynomial-to-Exponential Phase Transition

We study the Bayesian fixed-budget best-arm identification problem in which a learner can abstain from making a terminal recommendation. Subject to an abstention budget $α$, we analyze the probability of undetected error--the risk of recommending a suboptimal arm without abstaining. Our central finding is that abstention induces a phase transition: without abstention, the error probability decays polynomially in the sampling budget $T$; in contrast, introducing any small positive abstention budget shifts this to an exponential decay. For Gaussian priors and rewards, in the regime $T\to\infty$ followed by $α\downarrow0$, we establish exact matching information-theoretic lower bounds and algorithmic upper bounds on the optimal error exponent, which takes the form $\exp(-\frac{α^{2}T}{8κ_ν^{2}})$. The hardness parameter $κ_ν$ represents the prior density of the top-two gap at zero, highlighting that nearly tied instances drive the fundamental error. We introduce an adaptive algorithm, PGWS, that successfully achieves this optimal exponent by expending its abstention budget on statistically ambiguous instances. We further demonstrate that this polynomial-to-exponential improvement is exclusively a Bayesian phenomenon--in the frequentist setting, abstention only affects lower-order exponent terms. We also extend our results beyond the Gaussian model.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2606.29203

Country: North America > United States > New York (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Vauthier, Christophe, Mérigot, Quentin, Korba, Anna

Highly Data Parallelizable Estimation of the Sliced-Wasserstein Distance Using Cumulative Distribution Functions

The Sliced Wasserstein (SW) distance has emerged as a computationally attractive alternative to the Wasserstein distance by leveraging one-dimensional optimal transport along random projections. Standard estimators of the SW distance rely on Monte Carlo averages of one-dimensional Wasserstein distances computed via quantile functions, which require sorting projected samples and access to full datasets. In this work, we introduce a new class of estimators for the Sliced Wasserstein distance based on cumulative distribution functions (CDFs) of projected measures, that avoid sorting and scale via massive dataset parallelism. This class includes several estimators, some of them being indexed by hyperparameters controlling their variance or smoothness. We show that they are especially well suited to scenarios in which CDFs are more tractable than quantile functions, such as mixtures of Gaussians, and moreover that they are also naturally compatible with federated learning, since CDFs of projected data can be computed and aggregated locally without requiring the exchange of raw samples.

artificial intelligence, estimator, machine learning, (18 more...)

2606.3031

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

On Local Population-Risk Certificates

Song, Mingzhi

We develop finite-sample certificates for local population-risk increments $Pδ_v=R(θ_0+v)-R(θ_0)$, $v\in\mathcal D$. The primitive object is an expected-valid upper endpoint $\widehat{\mathsf U}_{\mathcal D}$ satisfying $\mathbb E\sup_{v\in\mathcal D} \{Pδ_v-\widehat{\mathsf U}_{\mathcal D}(v)\}\le0$. This uniform criterion certifies any measurable update selected from the same sample and allows penalties to depend on empirical geometry. The main construction is a cross-fitted ridge calibration for linear feature classes. A pilot fold learns the ridge metric, the complementary fold calibrates the squared mean error in that metric, and complete split averaging recovers the full empirical covariance in the directional quadratic form $\widehat q_{X,λ}$. The optimized diagnostic scale is $\{\widehat q_{X,λ}(h) \widehat r_{X,n_{\rm p},λ}^{\rm cf}/n\}^{1/2}$, and the calibrated trace factor $\widehat r_{X,n_{\rm p},λ}^{\rm cf}$ is compared with the ordinary ridge effective dimension $\widehat r_{X,λ}$. For nonsmooth losses, an exact fixed-mask decomposition $δ_v=J_v^0+R_v^\circ+C_v$ separates frozen Taylor fluctuations, good-path remainders, and interface crossings. Applying the linear and composite certificates componentwise yields endpoints for same-sample expected local search and concentrated release rules.

artificial intelligence, certificate, machine learning, (18 more...)

2606.19147

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Variance Reduction for Stochastic Gradient Generalized Non-reversible Langevin Monte Carlo Algorithms

Ni, Bingye, Wang, Xiaoyu, Wang, Yingli, Zhu, Lingjiong

We study the leading-order fluctuation of stochastic gradient Euler-Maruyama estimators for generalized non-reversible Langevin dynamics. Under structural assumptions tailored to the small-stepsize central limit theorem and under an unbiased stochastic gradient oracle, we prove that the empirical average over a horizon of order the inverse squared stepsize satisfies a central limit theorem in the vanishing-stepsize regime. The limiting variance is characterized through the Poisson equation of the limiting full-gradient diffusion. We then rewrite this constant in an operator form that links it to the continuous-time asymptotic variance and, under standard operator-theoretic assumptions, derive a sufficient condition under which an anti-symmetric perturbation strictly reduces the leading-order fluctuation constant relative to the reversible baseline. We also identify bounded smooth predictive observables that re directly covered by the main theorem. As a separate Gaussian calculation beyond the bounded-test-function regime, we obtain closed-form formulas for quadratic Hamiltonians and linear observables. The framework covers non-reversible Langevin dynamics and augmented-state examples including Hessian-free high-resolution dynamics and a positive-definite subclass of gradient-adjusted underdamped Langevin dynamics that allow stochastic gradients. Numerical experiments on basic examples and Bayesian linear regression using synthetic data, and Bayesian logistic regression using real data support the predicted Gaussian fluctuations and show that the non-reversible schemes consistently reduce the root mean squared error (RMSE) relative to their reversible baselines.

artificial intelligence, assumption 2, machine learning, (16 more...)

2606.28808

Country:

Asia > China (0.46)
North America > United States (0.45)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)